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1> '. Abstract 

IT) ■ In this contribution to the 2002 Vaxjo conference on the foundations 

of quantum mechanics and probability, I discuss three issues connected to 
q | Bell's theorem and Bell-CHSH-type experiments: time and the memory 

loophole, finite statistics (how wide are the error bars, under local realism?), 
and the question of whether a loophole-free experiment is feasible, a surpris- 
ing omission on Bell's list of four positions to hold in the light his results. 
Levy's (1935) theory of martingales, and Fisher's (1935) theory of random- 
ization in experimental design, take care of time and of finite statistics. I 
exploit a (classical) computer network metaphor for local realism to argue 
that Bell's conclusions are independent of how one likes to interpret proba- 
bility. I give a critique of some recent anti-Bellist literature. 
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It has always amazed me that anyone could find fault with Bell ( 1964)). 
Quantum mechanics cannot be cast into a classical mold. Well, isn't that 
delightful? Don't Bohr, von Neumann, Feynman, all tell us this, each in their 
own way? Why else are we fascinated by quantum mechanics? Moreover 
Bell writes with such economy, originality, modesty, and last but not least, 
humour. 

I want to make it absolutely clear that I do not think that quantum me- 
chanics is non-local. Bell also made it clear that his work did not prove that. 
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In fact, in Belli ( 1981 ), the final s ection of the paper on Bertlmann's famous 
socks (chapter 16 of Belli ( 1987 )1. he gave a list of four quite different po- 
sitions one could take, each one logically consistent with his mathematical 
results. One of them is simply not to care: go with Bohr, don't look for 
anything behind the scenes, for if you do you will get stuck in meaningless 
paradoxes, meaningless because there no necessity for anything behind the 
scenes. If, however, like Bell himself, you have a personal preference for 
imagining a realistic world behind the scenes, accept with Bell that it must 
be non-local. You will be in excellent company: with Bohm-Riley, with 
Girardi-Rimini-Weber (the continuous spontaneous localization model), and 
no doubt with others. Alternatively, accept even worse consequences — on 
which more, later. 

However at Vaxjo the anti-Bellists seemed to form a vociferous major- 
ity, though each anti-Bellist position seemed to me to be at odds with each 
other one. All the same, I will in this paper outline a recent positive develop- 
ment: namely, a strengthening of Bell's inequality. This strengthening does 
not strengthen Bell's theorem — quantum mechanics is incompatible with lo- 
cal realism — but it does strengthen experimental evidence for the ultimately 
more interesting conclusion: laboratory reality is incompatible with local 
realism. 

You may have a completely different idea in your head from mine as to 
what the phrases local realism and quantum mechanics stand for. As also 
was made clear at Vaxjo, a million and one different interpretations exist for 
each. Moreover these interpretations depend on interpretations of yet other 
basic concepts such as probability. However let me describe my concrete 
mathematical results first, and turn to the philosophy later. After that, I will 
discuss some (manifestly or not) anti-Bellist positions, in particular those of 
Accardi, Hess and Philipp, 't Hooft, K hrennikov, Kracklauer, and Volovich. 

I mentioned above that Bel3 ( 1981 ) lists four possible positions to hold, 
each one logically consistent with his mathematical results. Naturally they 
were not meant to be exhaustive and exclusive, but still I am surprised that 
he missed a to my mind very interesting fifth possibility: namely, that any 
experiment which quantum mechanics itself allows us to do, of necessity 
contains a loophole, preventing one from drawing a watertight conclusion 
against local realism. Always, because of quantum mechanics, it will be 
possible to come up with a local-realistic explanation (but each time, a dif- 
ferent one). This logical possibility has some support from Volovich's re- 
cent findings, and moreover makes 't Hooft's enterprise less hopeless than 
the other four possibilities would suggest. (I understand that Ian Percival 
has earlier promoted a similar point of view). 
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Personally, I do not have a preference for this position either, but put 
it forward in order to urge the experimentalists to go ahead and prove me 
wrong. It is a pity that the prevailing opinion, that the loophole issue is dead 
since each different loophole has been closed in a different experiment, is 
a powerful social disincentive against investing one's career in doing the 
definitive (loophole free) experiment. 

2. A Computer Network Metaphor 

To me, "local realism" is something which I can understand. And what I 
can understand are computers (idealised, hence perfect, classical computers) 
whose state at any moment is one definite state out of some extremely large 
(albeit finite) number, and whose state changes according to definite rules at 
discrete time points. Computers can be connected to one another and send 
one another messages. Again, this happens at discrete time points and the 
messages are large but discrete. Computers have memories and hard disks, 
on which can be stored huge quantities of information. One can store data 
and programs on computers. In fact what we call a program is just data for 
another program (and that is just data for another program . . . but not ad 
infinitum). 

Computers can simulate randomness. Alternatively one can, in advance, 
generate random numbers in any way one likes and store them on the hard 
disk of one's computer. With a large store of outcomes of fair coin tosses (or 
whatever for you is the epitome of randomness) one can simulate outcomes 
of any random variables or random processes with whatever probability dis- 
tributions one likes, as accurately as one likes, as many of them as one likes, 
as long as one's computers (and storage facilities) are large and fast enough. 
In the last section of the paper I will further discuss whether there is any 
real difference between random number generation by tossing coins or by a 
pseudo-random number generator on a computer. 

Computers can be cloned. Conceptually, one can take a computer and set 
next to it an identical copy, identical in the sense not only that the hardware 
and architecture are the same but moreover that every bit of information in 
every register, memory chip, hard disk, or whatever, is the same. 

Computer connections can be cloned. Conceptually one can collect the 
data coming through a network connection, and retransmit two identical 
streams of the same data. 

Consider a network of five computers connected linearly. I shall call 
them A, X, O, y and B. The rather plain "end" computers A and B are 
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under my control, the more fancy "in between" computers X, O and y are 
under the control of an anti-Bellist friend called Luigi. My friend Luigi 
has come up with a local realistic theory intended to show that Bell was 
wrong, it is possible to violate the Bell inequalities in a local realistic way. 
I have challenged my friend to implement his theory in some computer 
programs, and to be specific I have sti pulated that he should violate the 
Clauser. Home. Shimonv. and Holtl \ 1969b version of the Bell inequalities, 



as this version is the model for the famous lAspec t. Daliba rd. and Roe 
( 1982) experiment, and a host of recent experiments such as that of lWeihs et all 



( 1998). Moreover this experimental protocol was certified by Bell himself, 
for instance in the "Bertlmann's socks" chapter of "Speakable and Unspeak- 
able", as forming the definitive test of his theory. 

Another of my anti-Bellist friends, Walter, has claimed that Bell ne- 
glected the factor time in his theory. Real experiments are done in one lab- 
oratory over a lengthy time period, and during this time period, variables at 
different locations can vary in a strongly correlated way — the most obvious 
example being real clocks! Well, in fact it is clear from "Bertlmann's socks" 
that Bell was thinking very much of time as being a factor in classical corre- 
lation, see his discussion of the temporal relation between the daily number 
of heart-attacks in Lyons and in Paris (the weather is similar, French TV is 
identical, weekend or weekday is the same ...). In the course of time, the 
state of physical systems can drift in a systematic and perhaps correlated 
way. This means that the outcomes of consecutive measurements might be 
correlated in time, probability distributions are not stationary, and statistical 
tests of significance are invalidated. Information from the past is not for- 
gotten, but accumulates. The phenomenon has been named "the memory 
loophole". More insidiously, in the course of time, information can propa- 
gate from one physical subsystem to another, making everything even worse. 
(Think of French TV, reporting events in both Paris and Lyons with a short 
time lag.) In order to accomodate time I will allow Luigi to let his computers 
communicate between themselves whatever they like, in between each sep- 
arate measurement, and I will make no demands whatsover of stationarity 
or independence. I do not demand that he simulates specific measurements 
on a specific state. All I demand is that he violates a Bell-CHSH inequal- 
ity. I suggest that he goes for the maximal 2\/2 deviation corresponding 
to a certain state and collection of measurement settings, but the choice is 
up to him, since he has total control over his computers, and these choices 
are out of my control. My computers are just going to supply the results of 
independent fair coin tosses. 

The experiment can only generate a finite amount of data. How are we 
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going to decide whether the experiment has proved anything? How large 
should N be and what is a criterion we can both agree to? A physicist 
would say that we have a problem of finite statistics. 

One of my pro-Bellist friends, Gregor, an experimental physicist, has 
claimed that his experiment shows a thirty standard deviations departure 
from local realism. As a statistician I am concerned that his calculation 
of "thirty standard deviations" was done assuming Poisson statistics, which 
comes down to assuming independence between succesive measurements, 
while the anti-Bellist, because of the memory loophole, need not buy this as- 
sumption, hence need not buy the conclusion. As a statistician I realise that I 
must do my probability calculation from the point of view of the local realist 
(even if in my opinion this point of view is wrong). I must show that, as- 
suming a local realist position, the probability of such an extreme deviation 
as is actually observed is very small. This is not the same as showing that, 
assuming quantum mechanics is true, the probability that my experiment 
would have given the "wrong" conclusion (i.e., a conclusion favourable to 
the local realist) is very small. Of course it is a comfort to know this in 
advance of doing the experiment, and retrospectively it confirms the experi- 
menter's skill, but to the local realist it is just irrelevant. 

Now here an interesting paradox appears: a local realist theory is typi- 
cally a deterministic theory, hence does not allow one to make probability as- 
sumptions at all. However I think that even local realists agree that there are 
situations where one can meaningfully talk probability, even if any person's 
stated interpretation of the word might appear totally different from mine. 
However he interprets the word probability, most local realists will agree 
that in a well equipped laboratory we could manufacture something pretty 
close to an idealised fair coin (by which I mean a coin together with a well- 
designed coin tossing apparatus). It could be close enough, for instance, that 
we would both be almost certain that in 40 000 tosses the number of heads 
will not exceed 20 000 by more than 1 000 (10 standard deviations). Be- 
hind this lies a combinatorial fact: the number of binary sequences of length 
40 000, in which the number of l's exceeds 20000 by more that 1 000, is 
less than a fraction exp(— ^lO 2 ) of the total number of sequences. 

So I hope my anti-Bellist friends will let me (the person in control of 
computers A and B) either, ahead of the experiment, store the outcomes of 
fair coin tosses in them, or simulate them with a good pseudo-random num- 
ber generator, and more importantly, will be convinced when I give proba- 
bility statements concerning this and only this source of randomness in our 
computer experiment. 

Now here are the rules of our game. We are going to simulate an ide- 
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alised, perfect (no classical loopholes) Bell-CHSH type delayed choice ex- 
periment. For the sake of argument let us fix N = 15 000 as the total num- 
ber of trials (pairs of events, photon pairs, . . . ). In advance, Luigi has set up 
his three computers with any programs or data whatsoever stored on them. 
He is allowed to program his chameleon effect, or Walter's B-splines and 
hidden-variables-which-are-not-actually-elements-of-reality, or Al's theory 
of QEM, whatever he likes. 

For n = 1, . . . , N = 15 000, consecutively, the following happens: 

1. Computer O, which we call the source, sends information to comput- 
ers X and y, the measurement stations. It can be anything. It can be 
random (previously stored outcomes of actual random experiments) or 
pseudo-random or deterministic. It can depend in an arbitrary way on 
the results of past trials (see item 5). Without loss of generality it can 
be considered to be the same — send to each computer, both its own 
message and the message for the other. 

2. Computers A and B, which we call the randomizers, each send a 
measurement-setting-label, namely a 1 or a 2, to computers X and y. 
Actually, I will generate the labels to simulate independent fair coin 
tosses (I might even use the outcomes of real fair coin tosses, done 
secretly in advance and saved on my computers' hard disks). 

3. Computers X and y each output an outcome ±1, computed in what- 
ever way Luigi likes from the available information at each measure- 
ment station. He has all the possibilities mentioned under item 1. 
What each of these two computers do not have, is the measurement- 
setting-label which was delivered to the other. Denote the outcomes 

x (n) and y (n)_ 

4. Computers A and B each output the measurement-setting-label which 
they had previously sent to X and y. Denote these labels and 
b( n \ An independent referee will confirm that these are identical to 
the labels given to Luigi in item 2. 

5. Computers X, O and y may communicate with one another in any 
way they like. In particular, all past setting labels are available at all 
locations. As far as I am concerned, Luigi may even alter the computer 
programs or memories of his machines. 

At the close of these N = 15 000 trials we have collected N quadru- 
ples (a*" - ), b( n \ x( n \ y^), where the measurement-setting-labels take the 
values 1 and 2, the measurement outcomes take the values ±1. We count the 
the number of times the two outcomes were equal to one another, and the 
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number of times they were unequal, separately for each of the four possible 
combinations of measurement-setting-labels: 



N= b = #{n : = (a^&W) = (a,b)}, 
N& = #{n : * {n W n) , (o (n) ,6 (ri) ) = (a, 6)}, 
iVaf, = #{n:(a (n) ,6 (n) ) = M)}- 

From these counts we compute four empirical correlations (a mathematical 
statistician would call them raw, or uncentred, product moments), as follows. 



Pab 



N ab 

Finally we compute the CHSH contrast 

S = Pi2- Pn ~ P21 ~ P22- 

Luigi's aim is that this number is close to 2y/2, or at least, much larger than 
2. My claim is that it cannot be much larger than 2; in fact, I would not 



expect a deviation larger than several times 1/yN above 2. IWeihs et al 



( 1998b obtained a value of S fa 2.73 also with N « 15 000 in an experiment 



with a similar layout, except that the measurement stations were polariz- 
ing beam-splitters measuring pairs of entangled photons transmitted from a 
source through 200m of glass fibre each, and the randomizers were quantum 
optical devices simulating (close to) fair coin tosses by polarization mea- 
surements of completely unpolarized photons, see Appendix 1. A standard 
statistical computation showed that the value of S they found is 30 standard 
deviations larger than 2. 

Please note that Luigi's aim is certainly achievable from a logical point 
of view. It is conceivable, even, that Nf 2 = and = N 2l = = 0, 
hence that p\2 = +1, pu = P21 = P22 = — 1> and hence that S = 4. In 
fact if Luigi would generate his outcomes just as I generated the settings, as 
independent fair coin tosses, this very extreme result does have a positive 
probability. The reader might like to compute the chance. 

In order to be able to make a clean probability statement, I would like to 
make some harmless modifications to 5. First of all, note that the "correla- 
tion" between binary (±1 valued) random variables is twice the probability 
that they are equal, minus 1: 

~ N: b -(N ab -N= b ) N= b 

pab ~ K b - 2 au" l 
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Define 

Pab = N= b /N ab . 

Luigi's aim is to have 

(5 - 2)/2 = pj 2 - pTx - P21 ~ P22 

close to \[2 — 1, my claim is that it won't be much larger than 0. Now 
multiply (S — 2) /2 by N/4 and note that the four denominators N ab in the 
formulas for the p~, will all be pretty close to the same value, N/ 4. I propose 
to focus on the quantity Z«JV(S-2)/8 obtained by cancelling the four 
denominators against N/4: 

Z = jVf 2 - NTi - ATi " ^2 = 2- 

Luigi's aim is to have this quantity close to N(y/2 — l)/4 ~ N/10, or at 
least, significantly larger than 0, while I do not expect it to be larger than 
by several multiples of y/~N. 

He will not succeed. It is a theorem that whatever Luigi's programs 
and stored data, and whatever communication between them at intermediate 
steps, 



Prjz > fcViv} < exp 



-\k 2 



where k > is arbitrary. For instance, with N = 15 000, and k = 12.25, 
one finds that kVN sn N/10 while exp(-il2.25 2 ) < 10~ 32 . 

In fact I can improve this result — as if improvement were necessary! — 
replacing k in the right hand side by a number one and three quarters times 
as large, by a technique called random time change, which I shall explain 
later. But I cannot get any furth er improvem en t, in pa rticular, I cannot reach 
exp(— ^30 2 ), corresponding to Weihs et al. 's (^998) thirty standard devia- 



tions. Why? Because their calculation (with N « 15 000) was done assum- 
ing independent and identically distributed trials, and assuming probabilities 
equal to the observed relative frequencies, very close to those predicted by 
quantum mechanics; whereas my calculation is done assuming local real- 
ism, under the most favourable conditions possible under local realism, and 
assuming no further randomness than the independent fair coin tosses of the 
randomizers. 

If you are unhappy about my move from correlations to counts, let me 
just say that I can make similar statements about the original S, by com- 
bining the probability inequality for Z with similar but easier probability 
inequalities for the N ab . 
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3. Martingales 



Let me give a sketch of the proof. I capitalize the symbols for the settings 
and outcomes because I am thinking of them as random variables. Write 
each of the counts in the expression for Z as a sum over the N trials of an 
indicator variable (a zero/one valued random variable) indicating whether or 
not the event to be counted occured on the nth trial. A difference of sums is 
a sum of differences. Consequently, if we define 



*ab 



¥{X { ri = 7^, (A^\B^) = (a, 6)}, 



a(«0 — aW a( w ) aM a(") 

~~ ^12 ^11 ZA 21 ^22 ' 



n 

Z (n) _ ^ A (m) , 
m=l 



then Z = Z^ N \ Now I will show in a moment, using a variant of lBellf s 1964 
argument, that for each n, conditional on the past history of the first n — 1 
trials, the expected value of A^ n ) does not exceed 0, whatever that history 
might be. Moreover, A^ n ^ can only take on the values —1,0 and 1, in partic- 
ular, its maximum minus its minimum possible value (its range) is less than 
or equal to 2. This makes the stochastic process Z (0) , , . . . , Z (n) , . . . , Z (Ar) 
a supermartingale with increments in a bounded range, and with initial 
value Z^ = 0. The definition of a supermartingale is precisely the property 
that the increments A^ n ) have nonpositive conditional expectation given the 
past, for each n. A supermartingale is a generalisation of a random walk with 
zero or negative drift. Think for instance of the amount of money in your 
pocket as you play successive turns at a roulette table, where the roulette 
wheel is perfect, but the presence of a and means that on average, 
whatever amount you stake, and whatever you bet on, you lose 1/19 of your 
stake at each turn. You may be using some complex or even randomized 
strategy whereby the amount of your stake, and what you bet on (a specific 
number, or red versus black, or whatever) depends on your past experience 
and on auxiliary random inputs but still you lose on average, conditional on 
the past at each time point, whatever the past. The capital of the bank is 
a submartingale — nonnegative drift. If there would be no and , both 
capitals would be martingales — zero drift. In a real roulette game there will 
be a maximum stake and hence a maximum payoff. Your capital changes by 
an amount between the maximal payoff and minus the maximal stake. Thus 
your capital while playing roulette develops in time as a supermartingale 
with increments of bounded range (maximal payoff plus maximal stake). If 
you cannot play more than N turns, with whatever strategy you like (includ- 
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ing stopping early), it can only very rarely happen that your capital increases 
by more than a f ew times VN times half the range, as we shall now see. 

According to iHoeffding l's Il963l inequality, if a supermartingale (Z^ : 
n = 0, 1, . . . , N) is zero at time n = 0, and the range of its increments is 
bounded by 2, then 

PrjrnaxZ^ > A;Viv} < exp(-±/c 2 ). 

Note that if the increments of the supermartingale were actually independent 
and identically distributed, with range bounded by 2, then the maximum 
variance of Z^ N > is precisely equal to N, achieved when the increments are 
equal to ±1 with equal probability i. The Chebyshev inequality (sometimes 
known as Markov inequality) would then tell us that exceeds k^/~N 
with probability smaller than l/k 2 . Hoeffding has improved this in two 
ways: an exponentially instead of geometrically decreasing probability, and 
a maximal inequality instead of a pointwise inequality. One cannot do much 
better than this inequality: in the most favourable case, just described, for 
large N we would have that Z^ N ' is approximately normally distributed 
with variance N, and the probability of large deviations of a normal variate 
behaves up to a constant and a lower order (logarithmic) term precisely like 
exp(-i/c 2 ). 

The proof of Hoeffding 's inequality can be found in the better elemen- 
tary probability textbooks and uses Markov's inequality, together with a ran- 
dom time change argument, and finally some elementary calculus. This 
gives a clue to how I can improve the result: consider the random process 
only at the times when A( n ) ^ 0. In other words, thin out the time points 
n = 0, 1, ... in a random way, only look at the process at the time points 
which are left. By Doob's optional stopping theorem it is still a supermartin- 
gale when only looked at intermittently, even when we only look at random 
time points, provided that we never need to look ahead to select these time 
points. The increments of the thinned process still have a range bounded by 
2. Hence Hoeffding's inequality still applies. However, time is now running 
faster, thus the value of N in the inequality as stated for the new process cor- 
responds to cN in the old, with c > 1. In fact, in the actual experiment we 
only see a ±1 in a fraction 0.325 = \ ^ ai) p^ of all trials, hence we can 
improve the k on the right hand side by a factor 1/V0.325 = 1.75, hence 
12.25 can be increased to 21.5. 

It remains to prove the supermartingale property. Consider the quantity 
A( n ). Condition on everything which happened in the first n — 1 trials, and 
also on whatever new information Luigi placed on his computers between 
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the n — 1st and nth trial. Consider the situation just after Luigi's computers 
X and y have received their information from O, just before they receive 
the settings from A and B. Under my conditioning, the state of Luigi's 
computers is fixed (non random). Clone Luigi's computers (this is only a 
thought experiment). Give the first copy of computer X the input 1, as if this 
value came from A, and give the second copy the input 2; do the same in the 
other wing of the experiment. Let's drop the upper index (n), and denote by 
x\ and X2 the outputs of the two clones of X, denote by y\ and the outputs 
of the two clones of y. Because we are conditioning on the past up till the 
generation of the settings in the nth trial, everything is deterministic except 
the two random setting labels, denoted by A and B. The actual output from 
the actual (uncloned) computer X is X = xa, similarly in the other wing of 
the experiment. We find 

A£> =HX = Y, (A,B) = (a,b)} 

= nxa = y b }H(AB) = (a,b)}. 

The (conditional) expectation of this quantity is ¥{x a = y&}/4, since the 
randomizers still produce independent fair coin tosses given the past, and 
given whatever further modifications Luigi has made. Hence the expectation 
of A^ n ) given the past up to the start of the nth trial equals one quarter times 
^{x\ = 2/2} —Vf{xi = yi} -¥{x 2 = 2/1} -¥{x 2 = 2/2}- Now since the x a 
and yt, only take the values ±1, it follows that (xiy 2 )(xiyi) (x 2 yi) (x 2 y 2 ) = 
+ 1. The value of a product of two ±1 valued variables encodes their equality 
or inequality. We see that the number of equalities within the four pairs 
involved is even. It is not difficult to see that it follows from this, that the 
value of ¥{x x = 2/2} ~^{xi = 2/1} ~^{x 2 = 2/1} ~^{x 2 = 2/2} can 
only be or —2, so is always less than or equal to 0. We have proved the 
required property of the conditional expectation of A^ n ) conditioning not 
only on the past n — 1 trials but also on what happens between n — 1st and 
nth trial. Average over all possible inter-trial happenings, to obtain the result 
we want. The theorem is proved. 

As I remarked before, computers X, O and y are allowed to commu- 
nicate in anyway they like between trials, and Luigi is even allowed to in- 
tervene between trials, changing their programs or data as he likes, even 
in a random way if he likes. He can make use in all his computers of the 
outcomes of the randomizers at all previous trials. It does not help. No as- 
sumption has been made of any kind of long run stability of the outcomes 
of his computers, or stationarity of probability distributions. The only re- 
quirement has been on my side, that I am allowed to choose setting labels at 
random, again and again. Only this randomness drives my conclusion. You 
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may see my theorem as a combinatorial statement, referring to the fraction 

of results obtained under all the 4^ different combinations of values of all 

a (n) and 

Further details are given in Gilll ( 2003 ) though there I used the Bernstein 
rather than the Hoeffding inequality; Hoeffding turned out to give sharper 
results. A publication is in preparation giving more mathematical details and 
further results. In particular one can give similar Hoeffding bounds for the 
original quantity of interest S, and the unbiasedness of the two randomizers 
is not crucial. In fact Weihs had probabilities of heads equal to 0.48 and to 
0.42 in the two wings of his experiment. 

Martingales (avant la lettre) were introduced into probability theory by 
the great French probabilist Paul Levy in 1935. The name martingale was 
given to them a few years later by his student Ville, who used them to ef- 
fectively destroy Richard von Mises' programme to found probability on the 
notion of collectives and limiting relative frequencies. Only Andrei Nikolae- 
vich Kolmogorov realized that this conclusion was false, and he went on to 
develop the notion of computational complexity based on von Mises' ideas. 
Later still, the Dutch mathematician Michiel van Lambalgen has shown that 
a totally rigorous mathematical theory of collectives can be derived if one re- 
places the axiom of choice (which makes mathematical existence theorems 
easy, a double edged sword since it creates pathologies as well as desired 
results) with an alternative axiom, closer to physical intuition. 

The year 1935 also saw the introduction, by the great British statistician 
Sir Ronald Aylmer Fisher, of the notion of randomization into experimental 
design. He showed that randomized designs gave an experimenter total con- 
trol of uncontrollable factors which could otherwise prevent any conclusions 
being drawn from an experiment. 



4. Metaphysics 

The interpretation of Bell's theorem depends on notions of what is quantum 
mechanics, what is local realism, and behind them, what is probability. By 
the way, Bell himself does not state a theorem; just shows that certain as- 
sumptions imply a certain inequality. He shows that under a conventional in- 
terpretation of quantum mechanics, this inequality could be violated. How- 
ever, it has become conventional to call the statement that quantum me- 
chanics and local realism are incompatible with one another, Bell's theorem. 
This is a very convenient label, all the more convenient since later authors 
have obtained the same conclusion through consideration of other predic- 



12 



tions of quantum mechanics, s ome of them not on the face of it involving an 
inequality as Bell's. Actually. iDam. Gill, and Griinw ald (2003) arg ue else - 
where that these proofs of B ell's theorem without inequalities JHardvL[l993l) . 
or even without probability dGreenberger. Home, and A.LI1989I) . do actually 
involve hidden probability inequalities. 

On the one hand, Bell's theorem depends on an interpretation of quan- 
tum mechanics, together with an assumption that certain states and measure- 
ments, which one can consider as allowed by the mathematical framework, 
can also arise "in Nature", including Nature as manipulated by an experi- 
menter in a laboratory. What I call Bell's missing fifth position, is the po- 
sition that quantum mechanics itself forbids these states ever to exist. And 
not just the specific states and measurements corresponding to a particular 
proof of Bell's theorem, but any which one could use in the proof. Restrict- 
ing attention to a Bell-CHSH type experimental set-up, one does not need to 
achieve the magic one only needs to significantly exceed the bound 2. 
However, let me briefly describe the calculations behind this magic number 
(an upper bound under quantum mechanics, according to the Cirel'son in- 
equality), since this leads naturally to a discussion of the role of probability. 

It is conventional and reasonable to take the Hilbert space corresponding 
to a physical system consisting of two well separated parts of space as being 
the tensor product of spaces corresponding to the two parts separately. To 
achieve 2\/2, we need that a state exists (can be made to exist) of the joint 
system, which can be written (approximately) in the form 1 00) + |11) (up to 
normalization, and up to a tensor product with whatever else you like, pure 
or mixed); where as usual |00) = jo) (g> |0), |11) = |1) (g> |1), and |0) and 1) 
stand for two orthonormal vectors in both the first and the second space. We 
need that one can simultaneously (to a good enough approximation) measure 
whether the first subsystem is in the state cos a|0) + sin a\ 1) or in the state 
orthogonal to this, sma|0) — cos a|l); and whether the second is in the state 
cos f3\0) + sin /3|1) or in sin f3\0) — cos where one may choose between 
a = a>i or a = a>2, and between = 0\ and = 02', and where a good 
choice of angles (settings) leading to the famous 2\/2 are a>i = —ir/4—ir/8, 
a 2 = -tt/8, x =0,02 = tt/4. 

Conventionally it is agreed that the probability to find subsystem one in 
state | a) = cosa|0) +sina|l) and subsystem two in state \0) = cos/3|0) + 
sin/?|l), when prepared in \t = ( 1 00) + |ll))/v / 2, is the squared length of 
the inner product of ^ with | a) <g> \ 0) , which turns out to equal ^ cos 2 (a — 0) . 
This is the probability of the outcome +1, +1. The probability of —1,-1 
turns out to be the same, while that of +1, —1 and of —1, +1 are both equal 
to | sin 2 (a — 0)/2. The marginal probabilities of ±1 now turn out to equal 
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\ and the probability of equal outcomes is cos 2 (a — (3). Under the choices 
of angles above, one obtains pf 2 = (1 + 1/\/2)/2 0.85, while all the other 

Pab = C 1 - 1 /^)/ 2 ~ °- 15 - Consequently Pu-Pu-Pzi-Pzi = (V^-l)- 

Out of these calculations came joint probabilities of outcomes of binary 
measurements and every word here needs to be taken literally, if the ar- 
gument is to proceed: there are measurements taken in both wings of the 
experiment, and each can only result in a ±1. We use quantum mechanics 
to tell us what the probability of various combinations of outcomes is. Now 
there are a great many ways to try to make sense of the notion of probability, 
but everyone who uses the word in the context of quantum mechanics would 
agree that if one repeatedly measures a quantum system in the same state, in 
the same way, then relative frequencies of the various possible outcomes will 
stabilize in the long run, and they will stabilize to the probabilities, whatever 
that word may mean, computed by quantum mechanics. In the quantum ver- 
sion of our experiment, Z/N will stabilize to the value (\/2 — l)/4. 

My mathematical derivation of a stronger (probabilistic) version of the 
Bell inequalities did not hinge on any particular interpretation of probability. 
Someone who uses the word probability has a notion of fair coin tosses, and 
will not hesitate to apply probability theory to experiments involving nothing 
else than two times 15 000 fair coin tosses. If a certain event specified before 
the coins are tossed has a probability smaller than 10~ 32 one is not going to 
see that event happen (even though logically it might happen). 

It seems to me that the interpretation of probability does not play any se- 
rious role in the ongoing controversy concerning Bell's theorem. What does 
play a role is that quantum mechanics is used to compute joint probabilities 
of outcomes of binary measurements. 

Many quantum physicists will object that real physicists do not use quan- 
tum mechanics to compute probabilities, only the certain values of averages 
pertaining to huge collectives. Many others avoid recourse to Born's law by 
extending the quantum mechanical treatment to as large a part of the mea- 
surement device as possible. If probability is involved it appears to come in 
through an uncontroversial backdoor as statistical variation in the medium 
or the elements of the collective. 

That may be the situation in many fields, but people in those fields do 
not then test Bell's theorem. The critical experiment involves binary out- 
comes and binary settings, committed to sequentially as I have outlined. A 
better objection is that in no experiment done to date, has the experimental 
protocal de scribed in m y comp uter metaphor been literally enforced. For 
instance, in lWeihs et all s dl998h experiment, the only one to date where the 
randomization of detector settings at a sufficiently fast rate was taken seri- 
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ously (Aspect did his best but could only implement a poor surrogate), the 
N = 15 000 events were post-selected from an enormously much larger col- 
lection of small time intervals in most of which there was no detection event 
at all in either wing of the experiment; in a small proportion there was one 
detection event in one wing of the experiment or the other but not both; and 
in a smaller proportion still, there was a detection event in both wings of the 
experiment. Bell's argument just does not work when the binary outcomes 
are derived from a post-experimental conditioning (post-selection) on val- 
ues of other variables. Other experiments free of this loophole, did not (and 
could not) i m pleme nt the delayed random choice of settings; for instance 
RoweetaLfs (1200 lb experiments with trapped ions. 

Bell was well aware of this problem. In "Bertlman's socks" he offers a 
resolution, whereby the source O may output at random time moments a sig- 
nal that something is about to happen. Measurements at X and y based on 
a stream of random settings from A and B take place continuously, but after 
the experiment has run for some time, one selects just those measurements 
within an appropriate time interval after a saved "alert" message from O. It 
is practically extremely important that this selection may be done after the 
experiment has run its course. Post-selection is bad, but post-pre-selection 
is fine. 

By the way, the martingale methods I outlined above are admirably 
suited to adaptation and extension to continuous time measurement (of dis- 
crete events). Under reasonable (but of course untestable) "unbiased detec- 
tion" assumptions, one can obtain the same kind of inequalities, but now 
allowing detection events at random time points, and a random total number 
of events. 

But is "local realism" adequately represented by my metaphor of a com- 
puter network? For Bell, the key property of the crucial experiment is that 
the measurement station X commits itself to a specific (binary) outcome, 
shortly after receiving a (binary) input from randomizer A, before a signal 
from the other wing of the experiment could have arrived with information 
concerning the input which randomizer B generated in the other wing of the 
experiment. In the short time period between input of a and output of x, 
as far as the physical mechanism leading to the result x is concerned, we 
need only consider a bounded region of space which completely excludes 
the physical systems B and y. For me, "local realism" should certainly 
imply that a sufficiently detailed (microscopic) specification of the state in 
some bounded region of space would (mostly) fix the outcomes of macro- 
scopic, discrete (for instance, binary) variables. For instance, a sufficiently 
detailed specification of the initial state of a coin-tossing apparatus would 
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(mostly) fix the outcome. This does not prevent the outcome from being 
apparently random, on the contrary, but it does "explain" the apparent ran- 
domness through the variation of the initial conditions when the experiment 
is repeated. 

This means that in a thought experiment one can clone the relevant as- 
pects of the relevant portion of physical space, and one can carry out the 
thought experiment: feed into the same physical system both of the possi- 
ble inputs from the randomizer and thereby fix both the possible outputs. 
The output you actually see is what you would have seen if you would have 
ch osen, t he input wh ich you actually chose. 

iBelll d 19641 Il987h used a statistical conditional independence assump- 
tion, together with an assumption that conditional probability distributions 
of outcomes in one wing of the experiment do not depend on settings in the 
other wing, rather than my "counterfactual definite" characterization of lo- 
cal realism. Actually it is a mathematical theorem that the two mathematical 
notions are equivalent to one another. Each implies the other. Note that I 
do not require that my counterfactual or hidden variables physically exist, 
whatever that might mean, but only that they can be mathematically intro- 
duced in such a way that the mathematical model with "counterfactuals" 
reproduces the joint probability distribution of the manifest variables. 

In my opinion the present unfashionableness of counterfactual reasoning 
in the philosophy of science is quite misguided. We would not have ethics, 
justice, or science, without it. 

The original EPR argument also gives support for these counterfactuals: 
we know that if one measures with the same settings in the two wings of 
the experiment, one would obtain the same outcomes. Hence a local realist 
(like Einstein) quite reasonably considers the outcome which one would find 
under a given setting in one wing of the experiment, as deterministically 
encoded in the physical state of that part of the physical system, just before 
it is measured, independently of how it is actually measured. 

In my opinion the stylized computer network metaphor for a good Bell- 
CHSH type experiment is precisely what Bell himself was getting at. One 
cannot attack Bell on the grounds that this experiment has never been done 
yet. One might attack him on the grounds that it never can be done. One 
will need good reasons for this. His argument does not require photons, nor 
this particular state and these particular measurements. Again, showing that 
a particular experimental set-up using a particular kind of physical system is 
unfeasible, does not show that all experimental set-ups are unfeasible. 
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5. A Miscellany of Anti-Bellist Views 



Bell's Four Positions 

Bell offered four quite different positions which one might like to take com- 
patible with his mathematical results. They were: 

1. Quantum mechanics is wrong. 

2. Predetermination. 

3. Nature is non-local. 

4. Don't care (Bohr) . 

In my opinion he missed an intriging fifth position: 

5. A decisive experiment cannot be done. 

I would like to discuss a number of recent works in the light of these possi- 
bilities and the results I have described above. 

Accardi and the Chameleon Effect 

In numerous works L. Accardi claims that Bell's arguments are fundamen- 
tally flawed, because Bell could only think of randomness in a classical way: 
pulling coloured balls out of urns, where the colour you get to see was the 
colour which was already painted on the ball you happended to pick. If 
however you select a chameleon out of a cage, where some chameleons are 
mutant, and you place the chameleon on a leaf, it might turn green, or it 
might turn brown, but it certainly did not have that colour in advance. 

This is certainly a colourful metaphor but I do not think that chameleons 
are that different from coloured billiard balls: according to Accardi's own 
story whether or not a chameleon is mutant is determined by its genes, which 
certainly did not get changed by picking up one chameleon or another; and 
a mutant chameleon always turns brown when placed on a green leaf. 

The metaphor is also supposed to carry the idea that the measurement 
outcome is not a preexisting property of the object, but is a result of an evo- 
lution of measurement apparatus and measured object together. It seems to 
me that this is precisely Bohr's Copenhagen interpretation: one cannot see 
measurement outcomes separate from the total physical context in which 
they appear. Bohr's answer to EPR was to apply this idea also rigorously 
even when two parts of the measurement apparatus and two parts of the 
object being measured are light-years apart. This philosophy certainly abol- 
ishes the EPR paradox but to my mind hardly explains it. 
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Accardi does provide some mathematics (of the quantum probability 
kind) which is supposed to provide a local realistic model of the EPR phe- 
nomenon. Naturally a good quantum theoretician is able to replace the von 
Neumann measurement of one photon by a Schrodinger evolution of a pho- 
ton in interaction with a measurement device in such a way that though 
particle and apparatus together are still in a pure state at the end of the evo- 
lution, the reduced state of the measurement apparatus is a mixed state over 
two macroscopically distinct possibilities. One can do this for the two par- 
ticles simultaneously and arrive at a mathematical model which reproduces 
the EPR correlations in a local way, in a sense that the various items in the 
model can be ascribed to separate parts of reality. I don't think it qualifies 
as a local realistic model. 

However Accardi believes it is a local realistic model in the sense that he 
could have computer programs running on a network of computers which 
would simulate the EPR correlations, while implementing his mathemati- 
cal theory. These computer programs have run through several versions but 
presently Accardi 's web site does not seem to be accessible. Unfortunately 
none of the versions I have been able to test allowed me the sort of control 
over the protocol of the experiment, to which I am entitled under. In particu- 
lar, I was not able to see the raw data, only correlations. However by setting 
N = 1 one can get some idea what is going on inside the blackbox. Sur- 
prisingly with N = 1 it was possible to observe a correlation of ±1.4. Has 
the chameleon multiplied the outcome ±1 in one wing of the experiment by 
v/2? A later version of the program also allowed the outcome "no detec- 
tion" and though the author still claims categorically that Bell was wrong, 
the main thrust of the paper seems now to be to model actual experiments, 
which as is abundantly known suffer seriously from the detection loophole. 

The martingale results which I have outlined above were derived in or- 
der to determine how large N should be, so that I would have no danger of 
losing a public bet with Accardi, that his computer programs could not vi- 
olate the Bell-CHSH inequalities in an Aspect-type experiment, which is to 
say an experiment with repeated random choice of settings. Since he was to 
be totally free in what he put on his computers I could not use standard sta- 
tistical methods to determine a safe sample size. Fortunately the martingale 
came to my rescue. 

Hess and Philipp and non-elements-of-reality 

I first became aware of the contributions of Hess and Philipp through an 
article in the science supplement of a reliable Dutch newspaper. Einstein 
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was right after all. It intrigued me to discover that there was a fatal time- 
loophole in Bell's theorem, when I had just succeeded in fixing this loophole 
myself in order to make a safe bet with Accardi. 

The first publications by these authors appeared in print in somewhat 
mangled form, since the journal had requested that the paper be reduced and 
cut in two pieces. Some notational confusions and mismatches made it very 
difficult to follow the arguments. On the one hand the papers contained a 
long verbal critique of Bell, on the lines that correlations at a distance can 
easily be caused by synchronous systematic variation in other factors. This is 
Bell's own story of the frequency of heart attacks in distant French cities. On 
the other hand the papers contained a highly complex mathematical model 
which was supposed to represent a local realistic reproduction of the singlet 
correlations. Unfortunately the authors chose only to verify some necessary 
conditions for the locality of their model. Hidden variables which in the 
model were supposed to "belong" to one measurement station or the other 
were shown to be statistically independent of one another. 

In the latest publication Hess and Philipp have given a more transparent 
specification of their model, and in particular have recognised the important 
role played by one variable which in their earlier work was either treated as a 
mere index or even suppressed from the notation altogether. This variable is 
supposed to represent some kind of micro-time variable which is resident in 
both wings of the experiment. It turns out to have a probability distribution 
which depends on the measurement settings in both wings of the experiment. 
The authors implicitly recognise that it is non-local but christen it a "non- 
element-of-reality". Thus non-local hidden variables are fine, we just should 
not think of them as being real. They wisely point out that it seems to be a 
very difficult problem to decide which variables are elements or reality and 
which are not. In Appendix 2, 1 give a simplified version of their model. 

't Hooft and predetermination 

't Hooft notes that at the Planck scale experimenters will not have much free- 
dom to choose settings on a measurement apparatus. Thus Bell's position 2 
gives license to search for a classical, local, deterministic theory behind the 
quantum mechanical theory of the world at that level. So far so good. 

However, presumably the quantum mechanical theory of the world at 
the Planck scale is the foundation from which one can derive the quantum 
mechanical theory of the world at levels closer to our everyday experience. 
Thus, his classical, local and deterministic theory for physics at the Planck 
scale is a classical, local and deterministic theory for physics at the level of 
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present day laboratory experiments testing Bell's theorem. It seems to me 
that there are now two positions to take. The first one is that there is, also 
at our level, no free choice. The experimenter thinks he is freely choosing 
setting label number 2 in Alice's wing of the experimenter, but actually the 
photons arriving simultaneously in the other wing of the experiment, or the 
stuff of the measurement apparatus there, "know" this in advance and capi- 
talize on it in a very clever way: they produce deviations from the Bell in- 
equality, though not larger than Cirel'sons quantum bound of 2\/2 (they are, 
after all, bound by quantum mechanics). But we have no way of seeing that 
our "random" coin tosses are not random at all, but are powerfully correlated 
with forever hidden variables in measurement apparatus far away. I find it 
inconceivable that there is such powerful coordination between such totally 
different physical systems (the brain of the experimenter, the electrons in the 
photodetector, the choice of a particular number as seed of a pseudo-random 
number generator in a particular computer program) that Bell's inequality 
can be resoundingly violated in the quantum optics laboratory, but nature as 
a whole appears "local", and randomizers appear random. 

Now "free choice" is a notion belonging to philosophy and I would 
prefer not to argue about physics by invoking a physicist's apparently free 
choice. It is a fact that one can create in a laboratory something which looks 
very like randomness. One can run totally automated Bell-type experiments 
in which measurement settings are determined by results of a chain of sep- 
arate physical systems (quantum optics, mechanical coin tossing, computer 
pseudo-random number generators). The point is that if we could carry out 
a perfect and succesful Bell-type experiment, then if local realism is true an 
exquisite coordination persists throughout this complex of physical systems 
delivering precisely the right measurement settings at the two locations to 
violate Bell's inequalities, while hidden from us in all other ways. 

There is another position, position 5: the perfect Bell-type experiment 
cannot be made. Precisely because there is a local realistic hidden layer 
to the deepest layer of quantum mechanics, when we separate quantum- 
entangled physical systems far enough from one another in order to do sep- 
arate and randomly chosen measurements on each, the entanglement will 
have decayed so far that the observed correlations have a classical explana- 
tion. Loopholes are unavoidable and the singlet state is an illusion. 

Khrennikov and exotic probability theories 

In a number of publications Khrennikov constrasts a classical probability 
view which he associates with Kolmogorov, with a so-called contextualist 
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viewpoint. He also contrasts the Kolmogorov point of view and the von 
Mises (frequentist). Furthermore, he has suggested that the resolution of 
Bell's paradox might be found in some non-standard probability theory, for 
instance p-adic. A rationale for this might be that stabilization of relative 
frequencies might not be a fact at the micro-level, hence no classical proba- 
bility theory can be applied there. 

Let me first make some remarks on the question of whether an exotic 
probability theory might explain away the Bell paradox. Though there is no 
direct relation, I am reminded of an earlier attempt by Pitowskv Jl989h to 
resolve all paradoxes through adopting a mathematically very sophisticated 
and non-standard version of probability theory, in that case, by allowing 
non-measurable random variables and events. If events are not measurable, 
and moreover have lower and upper probabilities equal to zero and one re- 
spectively, then relative frequencies do not converge, bu t can have all valu es 
between and 1 as points of accumulation. This allows Pitowskv ( 19891) to 
wriggle out of the constraint of Bell's inequality. Each probability concern- 
ing hidden variables can take any value. 

Now experimentalists know that relative frequencies of macroscopic out- 
comes do tend to converge under many repetitions of a carefully controlled 
experiment, whether in quantum mechanics or not. The proof of Bell's the- 
orem as I give it does not require stabilization of relative frequencies of 
some further unspecified micro-variables, but of joint relative frequencies 
of macroscopic variables, both "what was actually measured" and of "what 
might have been measured". Moreover it assumes that the stabilized val- 
ues respect, by showing statistical independence, the physical independence 
which follows from locality. The results of a coin toss on one side of Inns- 
bruck campus is not correlated with a photon measurement on the other side. 
In the case of Pit owskv! (119891) . exotic probability does not "explain" at all; 
what is called an explanation is sleight-of-hand hidden under impressive 
(but very specialistic) mathematics. At best, the explanation would imply a 
physics which is even more weird than quantum mechanics. 

I have yet to study the case for p-adic probability carefully, but a priori I 
am highly sceptical. 

Regarding Kolmogorov and von Mises I have already remarked that I do 
not see any opposition between alternative views of probability here. Ko- 
mogorov merely describes probability, von Mises tries to explain it. Ko- 
mogorov's theory is mere accountancy. The underlying variable a; of a Ko- 
mogorovian probability space is not a physical cause, a hidden variable, it 
is merely a label of a possible outcome. Naturally, in classical physical sys- 
tems, there is a many-to-one correspondence between initial conditions and 
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distinguishable final conditions, so one could think of uj as being an element 
of a big list of initial configu rations. But this is n ot obligatory and, outside of 
physics, it is not usual. See iKolmogorovl (119331) for very clear descriptions 
of what lo is supposed to stand for and how probability can be interpreted. I 
think you will find that Kolmogorov was definitely a contextualist. 



Kracklauer and the bombs under Bell's theory 

According to Kracklauer, one counter-example is enough to explode a the- 
orem. Not content with one bomb he has come up with local realistic ex- 
planations of a large number of celebrated experiments in quantum mechan- 
ics. Unfortunately, showing that a long list of historical experiments did not 
prove what various experimenters and interpreters claim, does not prove a 
certain theory, which inspired those experiments, wrong. 

On the theoretical side he also has a large number of arguments, but in 
my opinion none is persuasive. One is that in real experiments there are not 
binary outcomes but there is macroscopic photoelectric current. But one can 
convert a continuous current to a binary outcome (does it exceed a given 
threshold or not). Bell's argument just requires that binary outcomes are 
output and analysed; any intermediate steps are irrelevant. 

Another argument is that photons do not actually exist. This certainly 
is a serious point regarding Bell-type experiments in quantum optics, and is 
connected to the Fifth Position, to which I will return. As a mathematician 
I have to admit that the word "photon" is perhaps no more than just a word. 
What we call a photon is associated with certain mathematical objects in 
certain theories of "electro-magnetic radiation" and associated with point- 
like events which one can identify in various experiments involving "light". 
Mathematics itself is just a game of logical manipulations of distinct sym- 
bols on pieces of paper. Bell was careful to describe his decisive experiment 
in terms of macroscopic every-day laboratory objects, and avoided any use 
of words like "particle" which only have a meaning within an existing the- 
ory. 

Another argument is that the mathematics of spin does not involve Planck's 
constant hence does not involve quantum mechanics. The transfer of EPR 
to the realm of spin half or of photons is lethal. However, it seems to me 
that quantum mechanics is as much about incompatible observables as about 
Planck's constant. 

Finally, Kracklauer enlists the support of the Javnes ( 1989h . who claimed 



to have resolved all probability paradoxes in physics by proper use of proba- 
bility theory. According to E.T Jaynes, Bell's factorization was an improper 
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use of the chain rule for conditional probability. Apparently Jaynes did not 
recognise an uncontroversial use of the notion of conditional independence. 
Suppose I have a large collection of pairs of dice. The two dice in each pair 
are identical. However half the pairs have two l's, two 2's and two 3's on 
their faces, the half have two 4's, two 5's and two 6's. Call these Type 1 
and Type 2 dice. Naturally if many times in succession, we take a random 
pair of dice, send one to Amsterdam and the other to Bagdad, and toss each 
dice once, there will be a strong correlation between the outcomes in the two 
locations. Denote by X and Y the outcomes at the two locations, and by T 
the type of the dice. Suppose moreover that the dice-throwing apparatus in 
Amsterdam and Bagdad each depend on a setting, called a and b, which is 
chosen by a technician in each laboratory. (The result of the setting is to bias 
the outcome in a way which I will not further specify here.) Bell calculates 
as follows: 



E ab {XY} = E{ E ab {XY | T} } 

= Pr{T = 1} E ab {XY | T = 1} + Pr{T = 2} E ab {XY | T 
= Pr{T = 1} E a {X | T = 1} E b {Y \ T = 1} 

+ Pr{T = 2} E a {X | T = 2} E b {Y \ T = 2}. 



Jaynes prefers to consider probabilities than expectations, that is fine. He 
points out that the mere fact that our probability of seeing a particular value 
for X is immediately changed when we are told the outcome of Y, does not 
mean any spooky action at a distance (as Bell also many times explained). 
He is also willing to apply the definition of conditional probability to write 



Pv ab {X = x,Y = y\T = t} 

= Pv ab {X = x \Y = y,T = t} Pr ab {Y = y\T = t} 



going on to say that Bell's theorem only prohibits Bell's kind of local hidden 
variable models, not all. He does not make any attempt to specify what he 
understands by a local model, and expresses great surprise at very new re- 
sults of Steve Gull, presented at the same conference as Jaynes' own paper, 
in which a computer network metaphor is introduced and where it is shown 
that the singlet correlations cannot be simulated on such a network! (Steve 



but then refuses to admit 



Pi ab {X = x | Y = y,T = t} 
Px ab {Y = y\T = t} 



Pr a {X = x\T 
Pr b {Y = y\T 



ty 
t}, 
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Gull faxed me his two pages of notes on this, which he likes to use an ex- 
amination exercise. His proof uses Fourier analysis). Jaynes thought that it 
would take another 30 years to understand Gull's work, just as it had taken 
the world 20 years to understand Bell's (the decisive understanding having 
just come from E.T.). I am not impressed. 

Bell's use of probability language was in 1964 still a bit clumsy. Jaynes' 
work led him to a strong sense that any probability paradox in physics is 
most likely the result of muddled thinking. I suspect that Jaynes was so 
confident of this general rule that he made no attempt to understand Bell's 
argument and consequently completely missed the point. 

Volovich and the fifth position 

Volovich's recent work shows that in an EPR type context of the state of two 
entangled particles propagating in three-dimensional space, quantum me- 
chanics itself would prohibit a loophole free test of local realism. Basically, 
particles will be lost with a too large probability, and the detection loophole 
is present. 

In my opinion it would be interesting to find out if this is generic. How- 
ever one must bear in mind that Bell's theorem is not dependent on a partic- 
ular kind of physical scenario (for instance, polarization of entangled pho- 
tons). The mathematical analysis must be carried out at a much more funda- 
mental level in order to show that no physical system consisting of two well 
separated subsystems can evolve into a sufficiently entangled state by any 
means whatsover. 

I would rather expect progress here to come from 't Hooft's programme: 
show that quantum mechanics at the Planck scale has a local realistic ex- 
planation, show that quantum mechanics at our scale is a consequence, and 
hence that it too is constrained by local realism. 

Alternatively progress will come from experiment: someone does carry 
out a loophole free Bell-CHSH type experiment, or does factor large integers 
in no time at all using a quantum computer. 

6. Last Word 

Tossing a coin, shuffling a pack or cards, picking a ball from an urn, are 
classical paradigms of randomness. Moreover all these experiments are well 
understood both from a physical and from a mathematical point of view. We 
understand perfectly well how small variations in initial conditions are mag- 
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nified exponentially and result in quite unpredictable macroscopic results. 
On the basis of physical symmetries we can propose uniform probability 
distributions over initial conditions, when listed appropriately, and can use 
this to predict the probabilities of macroscopic outcomes, for instance of bi- 
ased roulette wheels. We understand moreover that the probabilities of the 
macroscopic outcomes are remarkably robust to the probability distribution 
of initial conditions. Finally, the probability conclusions are quite indepen- 
dent of the flavour of probability interpretation. 

Actually, generating a pseudo-random number on a computer is no dif- 
ferent, except that the fine control which we can impose on initial conditions 
and on each intermediate step means that the result is exactly reproducible. 
But one can also buy a coin-tossing apparatus which so precisely fixes the 
initial velocity and angular momentum (among other factors) of the coin 
being tossed, that (unless one is unfortunate and chooses initial conditions 
close to the boundary between "heads" and "tails" that the coin falls the 
same way, (almost) every time. 

That statistical independence holds when well separated physical sys- 
tems are each used to generate randomness, is not harder to understand. An 
extraordinarily exquisite coordination between the number of times a pack 
of cards is shuffled, and between the force used to spin a coin into the air, 
could produce any degree of correlation in their outcomes. 

These considerations mean that for me, that Bell's theorem has more 
or less nothing to do with interpretations of probability. Classical physi- 
cal randomness, and classical physical independence, are what are at stake. 
My conclusion (excluding the fifth position) is that quantum mechanics is 
definitely non-classical. 

In order to establish that quantum mechanics is non classical, we had to 
assume that physical independence between randomization devices at sep- 
arate locations in space is possible. We had to assume a degree of control 
on the amount of information passing from one physical system to another: 
when we press the button labelled "1" on one of the measurement devices, 
only the fact that it was that button and not the other is important for the 
subsequent physics, even though actually we exert more or less pressure, 
for a longer or shorter time, and thereby could unbeknown to us be intro- 
ducing information from other locations and from the distant past into the 
apparatus. Bell's conditional independence assumption is a way to express 
the physical intuition, that even though this might introduce more statistical 
variation into the outcome, it cannot carry information from the other wing 
of the experiment, concerning the randomization outcome there. 

I find it fascinating that in order to prove that quantum mechanics is 
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intrinsically probabilistic (the outcomes cannot be traced back to variation 
in initial conditions) we must assume that we can ourselves generate ran- 
domness. And in order to demonstrate the kind of non-separatbility implied 
by entanglement, we have to assume control and separation of the physical 
systems which we use in our experiments. 
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Appendix 1: Weihs' data 







6 = 1 


6 = 1 


6 = 2 


6 = 2 






y = +i 


y= -i 


y = +i 


y= -i 


a = 1 


x = +1 


313 


1728 


1636 


179 


a = 1 


x = —1 


1978 


351 


294 


1143 


a = 2 


x = +1 


418 


1683 


269 


1100 


a = 2 


x = -1 


1578 


361 


1386 


156 



The table show the numbers of occurrences of each of the 16 possible values 
of (a, 6, x, y), see Weihs' 1999 thesis, page 113, available from his personal 
web pages at www . quantum . at. The grand total is N = 14 573. 



Appendix 2: A local model of the singlet cor- 
relations 

I present a caricature of the Hess-Philipp model, quant-ph/02120 85. 
The caricature has all those properties, on the basis of which Hess and 
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Philipp claimed its locality. However, the caricature is blatantly non local. 
This makes it clear that Hess and Philipp are only checking necessary con- 
ditions, not sufficient conditions, for locality. In my construction I will only 
consider planar settings (orientiations), and measure angles as fractions of 
27T, thus settings a, b become points in the unit interval [0, 1] with endpoints 
identified. I am going to construct random variables R, A*, A**, A whose 
joint probability distribution is allowed by Hess and Philipp to depend on a 
and b. Actually, my R will be a 2-vector. R is supposed to be some kind of 
microscopic (i.e., hidden to the experimenter) time variable. A* and A** are 
station variables. A is a source variable, transmitted to both stations. 

Let a and b be given. Let A*, A**, and A be independent random vari- 
ables, each uniformly distributed on [0, 1]. Define R = (Ri, R2) as follows: 

R 1 = (A** + a) modi, (1) 

R 2 = (A* + b) mod 1, (2) 

As required by HP, conditional on R, the pair (A* , A** ) is independent of 
A. All further independence properties desired by HP are trivially satisfied. 
However, 

b = (R 2 - A*) mod 1, (3) 

a = (R l - A**) mod 1. (4) 

Consequently, given R and A* one can reconstruct b; given R and A** one 
can reconstruct a and A*. 

Finally, let A = A(A*,A,R,a) and B = B(A*,A,R,b) be functions 
taking values in {—1, +1}. From the given arguments to A and B, the miss- 
ing station setting b and a can be reconstructed. From a, b and A one can 
construct a pair of binary random variables with joint probability distribu- 
tion depending in any way one likes on a and b. In particular one can arrange 
to reproduce the singlet correlations. 

To prove that both the HP model and this caricature are non-local, it 
suffices to observe that they reproduce the singlet correlations in a realistic 
fashion, and therefore by Bell's theorem cannot be local-realistic. However, 
according to Hess and Philipp this conclusion is short-sighted. Obviously, 
R is not an element of reality ! The only elements of reality in my model are 
A, A* and A**. They are evidently local, so my model is local, after all. 
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